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ABSTRACT 


Estimates  of  the  parameters  in  a  linear  model  are  considered 
based  upon  the  minimization  of  a  dispersion  function  of  the  resi¬ 
duals.  The  dispersion  function  used  depends  on  Walsh  averages  of 
pairs  of  residuals.  Results  similar  to  those  arising  with  signed 
rank  statistics  can  be  obtained  as  a  special  case.  Trimming  and 
weighting  of  the  Walsh  averages  can  occur  with  a  suitable  choice 
of  dispersion  function.  Asymptotic  properties  of  this  type  of 
dispersion  function  and  its  derivatives  are  developed  and  used  to 
determine  the  large  sample  distribution  of  the  estimates.  Some 
discussion  appears  on  the  practical  application  of  this  metho¬ 
dology. 


Key  Words  and  Phrases:  M-estimation;  Walsh  averages;  dispersion 

function;  signed  rank  statistics; 
robustness 


1 .  INTRODUCTION 


This  paper  is  concerned  with  the  development  of  robust 
statistical  methods  based  on  Ualsh  averages.  The  results  are 
broad  enough  to  include  many  of  the  familiar  results  on  Walsh 
averages  that  arise  with  signed  rank  procedures  and  also  to  allow 
for  extensions,  for  instance  to  trimmed  or  weighted  Walsh  aver¬ 
ages.  The  framework  of  a  general  linear  model  is  used  in  the 
development  so  that  applications  can  be  made  to  a  wide  range  of 
statistical  problems  including  one-  and  two-sample  problems, 
multiple  regression  problems  and  analysis  of  variance  and  co- 
variance  problems.  The  emphasis  will  be  on  the  estimation  prob¬ 
lem  although  the  large  sample  distributional  results  can  be  used 
to  specify  tests  of  hypotheses  in  a  natural  way. 

The  general  linear  model  is  given  by 

Y  -  X  _£  +  e,  (1.1) 

where  Y  ■  (Y. ,  ...  ,  Y  ) ' ,  X  *  (x. .)  is  an  n  x  p  design 
—  i  n  —  1  j 

matrix,  _£  ■  ( ^ ,  . . .  ,  0p) '  is  a  p  x  1  parameter  vector  and 
e  ■  (ej,  ...  ,  e^) '  is  an  n  xl  vector  of  independent, 
identically  distributed  error  random  variables  with  density  func¬ 
tion  f(y).  It  is  assumed  that  f(y)  is  symmetric  about  zero. 

Residuals  are  denoted  by  Z  ■  (Z. ,  ...  ,  Z  )'  where 

—  l  n 

Z  -  ZU)  ■  Y  -  Xb. 

Consider  estimating  the  parameter  _3  by  minimizing  a  measure 

of  dispersion  of  the  residuals.  In  the  least-squares  approach 

2  ... 
the  sum  of  squares  E^  Z^  is  used  as  the  dispersion  function. 

it  is  well-known  that  the  least-squares  estimate  is  not 


robust.  It  can  be  inefficient  and  heavily  influenced  by  outliers 
in  the  presence  of  nonnormal  error  distributions.  The  robust 
M-estimates  developed  by  Huber  (1964,  1972,  1973)  arise  by  mini¬ 
mizing  a  dispersion  function  ^P(Z^)  for  a  suitably  chosen 
convex  function  P  .  To  attain  a  degree  of  robustness  the  func¬ 
tion  should  increase  at  a  lesser  rate  than  the  quadratic  function 
in  its  tails.  The  ^  or  least  absolute  deviation  method 
minimizes  E  | | .  The  dispersion  function  ^  a(llt)|Z^|,  where 
a(*)  is  a  score  function  and  rT  is  the  rank  of  Z.  in 

l  l 

absolute  value,  generates  an  estimate  of  based  on  signed-rank 
statistics,  see  Adichie  (1967,  1978). 

The  basic  dispersion  function  to  be  considered  here  measures 
variability  in  the  Walsh  averages  of  the  residuals  with 

D  -  D(b)  -  w£j  P(Z£  ♦  ZJ,  (1.2) 

where  p  is  a  convex  function.  For  convenience,  the  "2"  in  the 
denominator  of  Walsh  averages  has  been  absorbed  in  the  P  func¬ 
tion.  The  constants  w„  >;  0  are  weights  reflecting  the 
importance  of  individual  Walsh  averages.  The  weights  can  depend 
on  the  design  matrix.  Zero-one  weights  can  be  used  to  omit  some 
Walsh  averages  from  consideration. 

For  the  present,  three  potential  p  functions  will  be 
mentioned.  The  first  is  simply 

p  j(t)  ■  |t{ .  (1.3) 

If  this  p  function  is  used  in  (1.2)  with  weights  w^  =  1, 
the  dispersion  function  is  very  similar  to  that  of  the  signed 
rank  approach  with  Wilcoxon  scores.  For  example,  in  the 
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one-sample  problem  with  Y^  *  0  +  e^  the  dispersion  function 
using  p^(t)  is  minimum  at  the  median  of  the  Walsh  averages 
(Y^  ♦  Yj)/2,  for  i  <  j,  which  is  essentially  the  signed  rank 
estimate  of  6  . 

Another  p  function  is 

p^(t)  *  max{|t|  -  c,  0)  (1.4) 

for  some  c  >  0.  This  function  is  zero  on  the  interval  [-c,  c] 
and  in  effect  trims  "middle"  Walsh  averages  that  are  sufficiently 
near  zero.  However,  p^(t)  can  also  be  viewed  as  a  simple 
modification  of  p^(t)  which  flattens  its  abrupt  behavior  at 
t  ■  0.  A  consequence  of  this  modification  may  be  that  the 
standard  error  of  £  becomes  more  stable,  but  this  conjecture 
needs  further  examination. 

Huber '8  p  function  can  also  be  used  in  the  dispersion 
(1.2).  It  is  quadratic  in  the  middle  with  linear  tails  and  is 
given  by 

p3(t)  -  t2/2  if  | t |  <  k  (1.5) 

k | t |  -  k2/2  if  | t |  >  k, 

for  some  k  >  0. 

The  above  p  functions  suggest  what  might  be  accomplished 
by  the  use  of  the  dispersion  function  (1.2).  With  p ^ ( t )  and  no 
weights  the  estimate  should  be  similar  to  that  arising  with  the 
signed  rank  dispersion  function.  The  use  of  weights  allows  broa¬ 
der  possibilities  and  the  modification  to  P^(t)  may  prove 
useful.  On  the  other  hand,  the  use  of  p^(t)  suggests  this  to 
be  an  extension  of  the  M-estimate  approach  (Huberizing  Walsh 


iA, 


averages).  Huber  (1964)  had  mentioned  this  type  of  idea  at  the 
end  of  his  first  paper. 
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2.  THE  MAIN  RESULTS 

In  this  section  the  basic  notation  is  introduced  and  the 
assumptions  are  listed.  The  basic  focus  will  be  on  the  deriva¬ 
tive  of  the  dispersion  function  (1.2).  Theorem  2.1  shows  that 
this  derivative  has  a  multivariate  normal  limiting  distribution. 
This  is  extended  to  the  case  of  contiguous  distributions  in 
Theorem  2.4.  These  results  are  useful  in  developing  test  statis¬ 
tics  for  testing  hypotheses  about  3_.  An  asymptotic  linearity 
result  is  given  in  Theorem  2.3  and  this  is  used  to  drive  the 
limiting  normality  of  the  estimate  _0  in  Theorem  2.5. 

Some  assumptions  will  be  listed  concerning  the  design  matrix 
X  and  the  weights  used  in  the  dispersion  function.  Extend  the 
definition  of  the  weights  to  the  case  of  i  >  j  by  defining 

w..  *  w. ..  Also  let  w. .  *  Z.i.  w. .  and  define  an  n  x  n  weight 
ji  lj  ii  jfi  ij 

matrix  W  *  (w„).  Then  W  is  a  symmetric  matrix  with  weights 

w^j  in  the  off-diagonal  locations  and  its  diagonal  elements  are 

the  sums  of  the  off-diagonal  elements  in  the  corresponding  row. 

Further,  define  a.  .(k)  ■  v.  .(x..  ♦  x..)  and  let  A.  (k)  ■ 
ij  ij  ik  jk  i 

I.i.  a..(k).  A  calculation  shows  that  the  n  x  p  matrix  having 
JTl 

fch 

A^(k)  for  its  i,k  element  is  given  by  A  ■  W  X. 

ASSUMPTION  (Ax): 

(l/n)X'X  -*•  l 

as  n  ■*■<*>,  where  £  is  a  p  x  p  positive  definite  matrix. 


'■'uV'  vVAv  a'  % ' 
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s* 
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c>3 
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1 


i 
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ASSUMPTION  (A2) :  For  each  k  ■  1 ,  ...  ,  p 


max 

l<i<n  1K 


*4J/  &  +  0 


AS  (1  — ►  oo 


ASSUMPTION  (A3) :  For  each  k  -  1 ,  ...  ,  p 


max  A-,  (k) 
l<i<n 

ij. i  A*(k) 


-►0  as  n  -*•  00  . 


ASSUMPTION  (A4):  For  each  k  -  1,  ...  ,  p 


^l<1  ai/k) 

JJ-1  Ai<k> 


-♦•0  as  n  «  . 


ASSUMPTION  (A5): 


n"3X'  W  W  X  -t-  V 


as  n  -*■  oo  ,  where  is  a  p  x  p  positive  definite  matrix. 


ASSUMPTION  (Afi) : 


n~2X'  W  X  -+  C 


as  n  -*■  oo  ,  where  £  is  a  p  x  p  nonsingular  matrix. 


The  following  assumptions  concern  the  p  function  in  the 
dispersion  (1.2).  They  are  sometimes  motivated  by  the  approach 
used  in  the  proofs  and  alternate  assumptions  could  be  specified 


with  different  proof  techniques.  The  basic  requirement  is  that 
p  and  its  derivative  be  sufficiently  smooth( piecewise) .  In  many 
cases  of  practical  interest  p  has  a  bounded,  piecewise  contin¬ 
uous  derivative  and  then  the  assumptions  could  be  considerably 
simplified. 

ASSUMPTION  (B^):  p(t)  is  a  convex  function,  symmetric  about 

zero,  with  a  derivative  ^(t)  *  p'(t)  except  at  possibly  a 
finite  number  of  points.  This  implies  that  4>(t)  is  nondecreas¬ 
ing  and  ip(-t)  *  -iji(t). 

ASSUMPTION  (B^):  h(t)  ■  Eq(iJ>(Y^  +  t))  exists  and  is  finite 

for  all  t,  where  the  expectation  is  under  the  assumption  that 
J3_  *  ^  in  model  (1.1). 

ASSUMPTION  (B^):  The  following  expectations  are  positive  and 
finite: 

T2  -  EQ(h2(Y1))  -  E0(4»(Y1  +  Y2)UYl  ♦  Y3)), 
t2  -  Eq(i|»2(Y1  +  Y2))  and 
E((h'(Y1))2). 

ASSUMPTION  (B. ) :  The  first  and  second  derivatives  h'(t)  and 
4 

h"(t)  exist  except  possibly  at  a  finite  number  of  points  and 
|h"(t)|  £M  for  some  constant  M. 

ASSUMPTION  (Bj)  :  H(t)  -  EQ(h(Y1  -  t))  -  EgC*^  +  Y2  -  t)) 

and  its  derivative  exist  in  a  neighborhood  of  t  *  0.  Moreover, 
H'(t)  is  continuous  at  t  *  0  and  H'(0)  ■  -E^Ch'). 


I 

I 

i 

|  ASSUMPTION  (Bg) :  For  some  constant  ,  EgOJ>^(Yj  +  Yj  -  t)) 

<  Mj  in  some  neighborhood  of  t  *  0. 


The  behavior  of  the  dispersion  function  (2.1)  can  be  studied 
through  the  vector  of  its  derivatives.  The  negatives  of  these 
derivatives  will  be  denoted  by  ^(b)  ■  (T^(b),  ...  ,  Tp(b))' 
where 


Tk(b)  -  -  3  D/3  bfe 


Z...  a.  .(k)<l>(Z.  +  Z.)  (2.- 

i-<J  iJ  i  J 


for  k  ■  1,  ...  ,  p,  where  m  P1  and  a.  .(k)  *» 

*  ij 

w.  .(x.,  +  X.,  )  . 
ij  lk  jk 

The  asymptotic  distribution  of  T?(b)  will  be  treated  by  the 
projection  method.  It  will  be  sufficient  to  assume  »  ()  in 
model  (1.1),  in  which  case  the  Y^  are  iid  with  symmetric 
density  f.  The  k*"*1  coordinate  of  _T(0)  has  projection 


T*(0)  -I“„,  E0(Tk(0)|Yt  -  yt) 

"  U”1  An*k)h*lrt)’ 

where  h(t)  is  defined  in  assumption  (B^).  Note  that 

E_(4> ( Y .  +  Y.))  ■  0  was  used  since  f  is  an  odd  function  under 
0  l  j 

assumption  (B^).  Thus  the  projection  of  jKO)  is  T*(0)  * 
(T|(£),  ...  ,  T£(0))'.  In  matrix  form, 

T*(0)  “  A'  H  -  X'  W  H 


where  H  ■  (h(Y, ) , 


#  •  • 


,  h(Y_) ) * . 


THEOREM  2.1.  Let  assumptions  (A^)  -  (A^)  and  (B^)  -( B^) 
hold.  Then  for  any  fixed  vector  _9  *  ( 0^ ,  ...  ,  8  )', 

(i)  n~3/2(£'T(0)  -  9/T*(0))  |Q  -2-+  0  and 

(ii)  n_3/2T(0)|0  — — *■  N(0,  x2V)  as  n  -*•  » 


Proof:  First  let 


u  *  W  X  0  ■  (u.  ,  ...  ,u)' 
— -  1  *  n 


(2.3] 


a*  <r+(  —  _-3/2i>  < 


Then  n  _9'T*(0)  *  n  _u'  _H  Is  a  sum  of  independent  random 


.  -3  2  2 

variables  with  mean  0  and  variance  n  t  I.  u. 

i  l 


n.  3  T2  _9*  1  W  X.  _0  -*•  x2  _9*  _9  by  assumption  (A^).  It  will 


2  n  2 

have  a  limiting  normal  distribution  if  max  u ?/£.  uf 

l<i<t:  1  1 


0  as  n  -*■  00  .  But  this  follows  from  assumptions  (A^)  and  (A^) 

|Q 


Thus  n  3/,2T*(0^)  j n  N(0,  x2^)  and  part  (ii)  will  follow  from 


part  (i). 

For  part  (i)  examine  the  expected  square 


n"3E0(6/T(0)  -  e_'T*(o))2 


n~3(E0(8.'T(0))2  -  EO(_0'T*(O))2) 


(xf  -  2x2)(n  3u'u)(y.,.  uf./Y.uf) 
1  - Hvj  lj  n  i 


where  u^  *  ^  9^  a.^(k).  The  middle  factor  converges  to  a 


constant  as  in  the  previous  paragraph  and  the  last  factor  tends 
to  aero  by  assumptions  (A.)  and  (Ae).  Thus  part  (i)  follows. 
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THEOREM  2.2.  Let  assumption*  (A^),  (A^)  -  (A^)  and  (B^)  - 

(B. )  hold.  Let  A  •  (A  ,  ...  ,  A  )'  and 
4  —  1  p 

R(A)  -  n_3/2(T(A/^n)  -  T(0)  +  EQ(h' )X' W  X  N  /n) . 

p 

Then  R(a)  |q  - *■  _0  as  n  -*•  “. 

Proof:  First  extend  the  definition  of  T*(0)  in  (2.2)  by  defin- 
ing  T*(b)  to  have  k  element 

T*(b)  -  ln  A. (k)h(Z. ) 

K  i-1  1  1 

where  Z  m  Tf  -  X  ]j.  Note  that  _T(b)  has  c^e  translation  pro¬ 
perty  T^)  —  TCbj  -  b2)  |q  -  T(0)  1^.^ 

and  so  also  does  T*(b) .  Then  Theorem  2.1  (i)  and  a  contiguity 
argument  shows  that 

n"3/2(e/T(0)  ~  l'T*(0))  |_A//-  0. 

Using  the  translation  property  it  follows  that 

n"3/2(!'!<£/^)  -  1't*(a/^))|0  o. 

Thus  it  is  sufficient  to  replace  T^  by  T*.  in  verifying  that 
_0'R(_A)  |Q  -£-*■  0. 

Define  an  n  *  1  vector  of  constants  £  *  JC  _A/  f'tT.  Then 
with  u  as  in  (2.3)  use  a  Taylor's  approximation  to  write 
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_9'r*(A//^)  -  u.h(Y.  -  t.) 

-  i-h^)  -  \  u^.h’tY.) 

*  \  u.cVuj/2 

where  |  -  Y^|  £  t^,  i  *  1,  ....  n.  Then 

_9'R*(  A)  -  a“3/2(_0'T*(  A/^)  -  _9'T*(0)  +  E0(h'WMU/4) 

-  n"3/2(Zi  uih(Yi+ti)  -  li  uih(Yi)  +  EQ(h')Ei  u.t^ 

-  -n"3/2  Zi  uit.(h'(Yi)  -  Eq (h * ))  +  n“3/2  Zi  u.tjfh"^)^ 
*  Sj  +  S2  say. 

Now  is  a  sum  of  independent  random  variables  with  mean  zero 

and  variance 

n  3  u2t2  Var(h'(Y^)) 

_<  (n‘3  u'u)  (  max  U?/E.  u?)(E.t?)  Var(h'(Y  )). 

l<i<n  11111  1 

The  first  factor  here  converges  to  a  constant  and  the  second 

factor  converges  to  zero  as  in  the  proof  of  Theorem  2.1.  Also 

t2  *  A^X'IC  A/n  *►  A_'_£  jA  by  assumption  (A^).  With  its  variance 

P 

tending  to  zero,  - *"  0.  Using  the  bound  on  h"  in  as¬ 
sumption  (B^),  the  term  is  bounded  by  a  constant  which 

tends  to  zero, 

||  <  n-3/2  max  |u.|  M  E.  t2/2. 

2  l<i<n 

P 

Thus  S£  - ►  0  and  the  proof  is  completed. 


The  previous  theorem  shows  that  ^(10  can  be 
approximated  by  a  linear  function  of  t>  for  J)  near  zero. 
However,  the  result  is  not  strong  enough  for  the  application 
needed  here.  The  following  theorem  shows  that  this  result  holds 
uniformly.  A  proof  will  not  be  given  as  it  is  quite  lengthy  and 
the  details  follow  closely  the  compactif ication  argument  used  in 
the  proof  of  Theorem  5.1  of  Sievers  (1983). 

THEOREM  2.3.  Let  assumptions  (A.)  -  (Ar)  and  (B.)  -(B,. ) 

■ 1,1  1  j  lb 

hold .  Let  D  *  {(A. ,  ...  ,  A):  I  A.  |  £.  c ,  l£k<^p),  where 

c  >  0,  and  let  ||  •  1 1  denote  Euclidean  distance.  Then 

sup  1 1  jU  A)  1 1  — - — ►  0  as  n  -*■  °°  . 

A  ^  D 

The  asymptotic  distribution  of  T(0)  given  in  Theorem  2.1 
can  be  extended  to  the  case  of  contiguous  distributions.  The 
result  follows  readily  from  Theorem  2.2  and  is  summarized  in  the 
following  theorem. 

THEOREM  2.4.  Let  assumptions  (A^),  (A^)  -  (A^)  and 
(Bj)  -  (B^)  hold.  Then  as  n  -*•  »  , 

n”3/2T(0)  |  ^  ^  -£—•  N(E0(h’  )CA,  t2V)  . 

Finally,  the  limiting  distribution  of  the  estimate  _8  can 
be  given.  With  the  asymptotic  linearity  result  of  Theorem  2.3, 
the  argument  of  Jaeckel  (1972)  and  Sievers  (1983)  can  be  used. 


First  note  a  translation  property  of  the  estimate, 

/n(8_  -  ^)  |  ,  where  ^  minimizes  D*(_A)  ■  D(A,//n)/n. 

Sl  H 

The  asymptotic  linearity  implies  sup^  “  QOOIIq  ~  * 

where  Q  is  the  quadratic  function 

Q(A)  “  E0(h')A'C  A/2  -  n-3/2A’T(0)  +  D*(0) . 

Form  this  it  follows  that  A^  asymptotically  equivalent  to  the 
point  minimizing  Q(M .  The  following  theorem  summarizes . 

THEOREM  2.5.  Let  assumptions  (A^)  -  (Ag)  and  (B^)  - 
(Bg)  hold.  Then  as  n  -*■  00 


3 .  GENERAL  COMMENTS 


The  regular  M-estimate  of  J3  minimizing  £p(Z^)  has  an 

influence  function  proportional  to  ip(y)  and  its  asymptotic 

2  2  -1 

variance-covariance  matrix  is  E( )/(E(^'))  .  The  estimate 

of  £  minimizing  the  dispersion  (1.2)  has  an  influence  func¬ 
tion  h(y),  which  is  a  smoothed  version  of  ^(y) ,  and  its  vari¬ 
ance-covariance  matrix,  given  in  Theorem  2.5,  may  have  a  factor 
larger  or  smaller  than  that  of  the  regular  M-estimate.  Some 
examples  of  these  quantities  appear  in  the  next  section. 

There  is  special  interest  in  conditions  under  which  the 
matrix  £  *V  £  appearing  in  Theorem  2.5,  equals  _£  ^ .  If 
this  is  the  case,  the  variances  of  can  be  compared  to  the 
variances  of  regular  M-estimates  and  least-squares  estimates 
simply  by  the  constant  multiples  of  this  matrix.  An  answer  to 
this  question  can  be  given  for  the  unweighted  case,  w„  -  1.  In 
this  case  W  «  (n-2)!^  +  _J,  where  _I  is  an  identity  matrix  and 
J_  a  matrix  of  "ones" .  Then  C  ■  j  +  py 1  and  V  *  £  +  3_P  1 , 
where  is  the  limit  of  the  column  means  of  X.  Then  a  suf¬ 
ficient  condition  for  equivalently 

_V  ■  £  j;  *  C,  is  given  by 

-  1,  (3.1) 

as  can  be  seen  by  direct  multiplication.  This  condition  is 
easier  to  verify  in  particular  cases  than  the  basic  equation 
itself. 

To  estimate  the  standard  errors  of  the  estimates  in  £_  an 

2  2 

estimate  is  needed  for  the  scale  factor  t  /(Eg(h'))  .  Recalling 
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that  t ^  *  Eq(4»(Y^  +  Y2)iJi(Y^  +  Y^)),  a  U-atatistic  is 
suggested  for  the  numerator.  A  symmetric  kernel  is  given  by 
^(Yp  Y2,  Y3)  -  (ip(Y1  +  Y2)^(Yl  +  Y3)  +  i|)(Y1  +  Y2)i|>(Y2  +  Y3)  ♦ 

A  »A 

i|)(Yj  ♦  Y3)ip ( Y2  ♦  Y3))/3.  Then  using  residuals  Z  m  _Y  -  X  _3,  a 
.  2 

consistent  estimate  of  t  is 


*2 

x 


To  estimate  the  denominator  of  the  scale  factor  consider  the  case 
where  h'(t)  *  jV(y  +  t)f(y)dy.  Then  EQ(h')  *  EQ('|',(Y1  +  Y2>) 
and  a  consistent,  U-statistic  estimate  is  given  by 


The  computational  aspects  discussed  in  Huber  (1972,  1973) 
for  regular  M-estimates  could  be  modified  for  use  in  computing 
ji  for  p  functions  satisfying  his  conditions.  In  particular, 
a  scale  measure  should  be  used  with  some  p  functions,  such  as 
(1.5).  The  process  will  be  slower  since  the  dispersion  (1.2) 
involves  |  j  rather  than  n  terms. 

There  is  another  type  of  dispersion  function  that  can  be 
used  for  the  analysis  of  a  linear  model.  Consider 


D.  -  D,  (b)  -  Y.  w.  .  (Z.  Z.) . 

1  1  —  i*i<j  lj  j  i 

Dispersion  is  measured  by  differences  of  residuals.  This  is  a 
generalization  of  the  dispersion  function  considered  inJJievers 
(1983)  where  p(t)  ■  |t|  was  used.  With  no  weights  this  Gini's 
mean  difference  was  shown  by  Hettmansperger  and  McKean  (1978)  to 
generate  the  rank  estimate  of  3  based  on  Wilcoxon  scores.  The 


projection  and  asymptotic  linearity  approach  of  this  paper  can  be 
used  with  only  minor  changes  to  obtain  the  theoretical  properties 
for  the  estimate  minimising  .  The  results  are  basically  the 
saaw  as  Theorems  2.1  -  2.5  with  some  changes  in  the  details. 

A 

Tests  of  hypotheses  can  be  developed  based  on  T(0) ,  £_  or 
the  dispersion  function,  see  Hettmansperger  and  McKean  (1977)  and 
Schrader  and  Hettmansperger  (1980). 


4.  EXAMPLES 


The  introduction  discussed  three  possible  p  functions  for 
use  in  the  dispersion  (1.2).  Further  details  on  these  functions 
will  be  given  in  this  section,  in  particular,  on  the  influence 
function  h(y)  and  quantities  appearing  in  the  asymptotic  vari¬ 
ance.  Some  comments  are  made  on  the  one-  and  two-sample  problems 
and  on  the  simple  linear  regression  model. 

The  function  P^(t)  ■  |t|  has  derivative 

*.(t)  -  +1  if  t  >  0 

-  1  if  t  <  0. 

2 

Then  the  influence  function  is  h(t)  *  2F(t)  -  1  and  t  *  1/3. 

2 

Also  h'(t)  ■  2f(t)  and  Eg(h')  ■  2 Jf  .  The  asymptotic  variance 

2  2 

factor  is  l/12(Jf  )  ,  the  familiar  result  for  signed  rank 
estimates. 

The  function  p£(t)  of  (1.4)  has  derivative 

ip2( t )  -  -  1  if  t  <  -c 

0  if  |t|  £  c 

+1  if  t  >  c. 

Then  the  influence  function  is  h(t)  ■  F(c  +  t)  -  F(c  -  t)  and 
2 

x  is  the  expected  square  of  this  function.  Also 

h'(t)  ■  f(c  ♦  t)  +  f(c _-  t).  The  expected  value  of  h'  can  be 

expressed  as  Eg(h')  ■  2g(c),  where  g(y)  is  the  density 


function  of  Yj  + 
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The  Huber  funcCion  P , ( t )  of  (1.5)  has  derivative 


4>3(t)  -  -k 

if 

n 

/\ 

1 

7? 

t 

if 

|t|  <  k 

+k 

if 

t  >  k. 

The  influence  function  is  h(t) 

m 

I  t+}C  F(u)du  -  k  and  it  can  be 
J  t-k 

viewed  as  a  smooching  of  It  readily  can  be  seen  that 

h'(t)  -  F(t  +  k)  -  F(t  -  k)  and  EQ(h')  -  PQC |YX  +  Yj  <  k) . 


For  the  one-sample  problem  the  Y^  are  assumed  to  be  sym¬ 
metric  about  a  point  9  and  the  dispersion  function  is 
D(9)  *  p(Y^  +  "  20).  There  appears  to  be  no  use  for 

weights  here.  The  assumptions  simplify  considerably.  If  is 

used,  the  estimate  is  the  median  of  (Y^  +  Y.)/2  for  i  <  j.  • 

* 

The  effect  of  the  Walsh  averages  on  8  can  be  trimmed  or 
smoothed  with  other  choices  of  the  p  function. 

For  the  two-sample  problem  suppose  there  are  samples  of 
sizes  n^  and  n^  from  two  groups  and  with  locations 

0^  and  Bg*  Write  the  design  matrix  as 


where  J)  and  _1  are  vectors  of  zeros  and  ones,  respectively. 
Suppose  w^j  ■  w^  if  i,j  are  both  in  G^ ,  w^  *  W22  if  i » j 
are  both  in  G2  and  w^  ■  w^  if  i,j  are  in  different  groups. 
Assumptions  (Aj )  -  (A.)  will  hold  in  this  case.  The  dispersion 


function  becomes 


I1 


D  "  W11  EG1,G1p(Yi  +  Yj  26r  +  W22  ZG2,G2P  Yi  +  Yj  "  2J 

+  W12  ZG1,G2  p^Yi  +  Yj  ’  gl  ~^t}  ' 

A 

IC  appears  that  6^  depends  on  the  data  from  both  groups  if 
w12  **  similarly  for  g2>  This  differs  from  the  regular 
M-estimate  method  where  depends  only  on  the  data  from  group 

G^,  k  ■  1,  2.  It  can  be  verified  by  direct  computation  that 


V 

k  *  1 

-1 

-1 

c  v 

£ 

and 

w12 

and  w^2  and  as  a  result  these  weights  have  no  effect  on  the 
asymptotic  variances. 

In  the  simple  linear  regression  model  ♦  &2  x^  + 

1  £  i  £  n.  The  dispersion  function  is 


I 


D  ’  Ei<j  ”ij  ‘  Yj  -  26 !  -  (*i  *  V62>- 


In  the  case  of  no  weights,  w^j  =  1,  expressions  for  £,  £  and 
Z_  are  readily  computed  and  (3.1)  implies  £  * V  £  *  ■  z_  the 
familiar  matrix  for  this  problem.  It  is  not  clear  if  this  can 
hold  for  other  choices  of  weights. 


i\ 


'i‘»v  •' ^ 


-20- 


REFERENCES 


Adichie,  J.N.  (1967),  "Estimates  of  Regression  Parameters  Based 
on  Rank  Tests,  "Annals  of  Mathematical  Statistics,  38,  894- 
904. 

Adichie,  J.N.  (1978),  "Rank  Tests  of  Sub-Hypotheses  in  the 
General  Linear  Regression,"  Annals  of  Statistics,  6,  1012- 
1026. 

Hettmansperger ,  T.P.  and  McKean,  J.W.  (1977),  "A  Robust  Alterna¬ 
tive  Based  on  Ranks  to  Least  Squares  in  Analyzing  Linear 
Models,"  Technometrics,  19,  275-284. 

Hettmansperger,  T.P.  and  McKean,  J.W.  (1978),  "Statistical  In¬ 
ference  Based  on  Ranks,"  Psychometrika,  43,  69-79. 

Huber,  P.J.  (1964),  "Robust  Estimation  of  a  Location  Parameter," 
Annals  of  Mathematical  Statistics,  35,  73-101. 

Huber,  P.J.  (1972),  "Robust  Statistics,"  Annals  of  Mathematical 
Statistics,  43,  1041-1067. 

Huber,  P.J.  (1973),  "Robust  Regression:  Asymptotics,  Conjectures 
and  Monte  Carlo,"  The  Annals  of  Statistics,  1,  799-821. 

Jaeckel,  L.A.  (1972),  "Estimating  Regression  Coefficients  By 

Minimizing  the  Dispersion  of  the  Residuals,"  Annals  of  Mathe¬ 
matical  Statistics,  43,  1449-1458. 

Schrader,  R.M.  and  McKean,  J.W.  (1980),  "Robust  Analysis  of  Vari¬ 
ance  Baaed  Upon  a  Likelihood  Criterion,"  Biometrika,  67,  93- 
101. 

Sievers,  G.L.  (1983),  "A  Weighted  Dispersion  Function  For  Estima¬ 
tion  In  Linear  Models,"  Communications  in  Statistics  -Theory 
and  Methods  -  A,  12,  1161-1179. 


iilft '•&&, jltl 


security  classification  or  this  page  m«i  om  sni««« 


REPORT  DOCUMENTATION  PAGE 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


|2.  GOVT  ACCESSION  NOJ  J.  RECIPIENT'S  CATALOG  NUMBER 


‘  Technical  Report  #72 


4.  TITLE  <tn4  SuOtltio) 

Robust  Estimation  Based  on  Walsh  Averages 
For  the  General  Linear  Model 


S.  TYRE  or  REPORT  A  PERIOD  COVERED 

Technical  Report  1983 


S.  PERFORMING  ORG.  REPORT  NUMBER 


AUTHORED 


S.  CONTRACT  OR  GRANT  NUMBERfaJ 


Gerald  L.  Sievers 


N  00014-78-C-0637 


S.  PERFORMING  ORGANIZATION  NAME  AND  AOORESS 

Western  Michigan  University 
Kalamazoo,  Michigan  49008 


II.  CONTROLLING  OrPICE  NAME  AND  ADORESS 

Office  of  Naval  Research 
Statistics  and  Probability  Program 


10.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  4  WORK  UNIT  NUMBERS 

NR  042-407 


12.  REPORT  OATS 

November  1983 


IS.  NUMBER  OP  PAGES 

22 


I 


MONITORING  AGENCY  NAME  •  ADORESSfff  dllloto nt  Irom  Controlling  Ollleo)  IS.  SECURITY  CLASS,  (ol  Olio  roport) 

Unclassified 


IS*.  DECL  ASSI  PIC  ATI  ON/  DOWNGRADING 
SCHEDULE 


B.  DISTRIBUTION  STATEMENT  (ol  thlo  Roport) 


Approved  For  Public  Release:  Distribution  Unlimited 


17.  DISTRIBUTION  STATEMENT  (ol  tko  okotto el  ontorod  to  Biook  20,  II  dlltoront  Root  Ropott) 


IS.  KEY  BOROS  (Conthmo  on  rororOo  olgo  II  noooooorr  ono  lOontttr  Or  Olook  mm kot) 

M-estimation;  Walsh  averages;  dispersion  function;  signed  rank  statistics; 
robustness 


SB.  ABSTRACT  (Conthmo  on  roooroo  oldo  II  noooooorr  ong  lOontttr  Op  Olook  nuookot) 


Please  see  next  page 


00  .ETn  1473  EDITION  OF  I  NOV  SS  IS  OBSOLETE 


%/R  0102-LP-01 4-4401 


Unclassified 

SECURITY  CLASSIFICATION  OP  THIS  PAGE  ?Rv  Bntorod) 


r 


Unclassified 


tKCUWITV  CLASSIFICATION  OF  THIS  FAO*  fWh«i  Data  Enl* rad) 


20.  ABSTRACT 


Estimates  of  the  parameters  in  a  linear  model  are  considered 
based  upon  the  minimization  of  a  dispersion  function  of  the  resi¬ 
duals.  The  dispersion  function  used  depends  on  Walsh  averages  of 
pairs  of  residuals.  Results  similar  to  those  arising  with  signed 
rank  statistics  can  be.  obtained  as  a  special  case.  Trimming  and 
weighting  of  the  Walsh  averages  can  occur  with  a  suitable  choice 
of  dispersion  function.  Asymptotic  properties  of  this  type  of 
dispersion  function  and  its  derivatives  are  developed  and  used  to 
determine  the  large  sample  distribution  of  the  estimates.  Some 
discussion  appears  on  the  practical  application  of  this  metho¬ 
dology. 

fv 


Unclassified 

SFniAITV  CLASSIFICATION  OF  THIS  FAOEfWi*"  0«A 


